Quit Emailing Yourself

# diffusion → language models → parallel decoding

1 link tagged with all of: diffusion + language models + parallel decoding

[2505.22618] Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

The article introduces Fast-dLLM, a method for accelerating diffusion-based large language models (LLMs) by implementing a block-wise approximate Key-Value (KV) Cache and a confidence-aware parallel decoding strategy. This approach addresses the slow inference speed of diffusion LLMs and mitigates quality degradation during parallel token decoding, achieving significant throughput improvements while maintaining accuracy. Experimental results show up to 27.6 times higher throughput, facilitating the practical deployment of diffusion LLMs.

Saved by hn_user_7 · 1 other saved this · Last saved October 28, 2025 · 3 min read

diffusion ✓ language models ✓ parallel decoding ✓

Links

[2505.22618] Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding